Goto

Collaborating Authors

 significant improvement






Renyi Differential Privacy of The Subsampled Shuffle Model In Distributed Learning

Neural Information Processing Systems

We study privacy in a distributed learning framework, where clients collaboratively build a learning model iteratively throughinteractions with a server from whom we need privacy. Motivated by stochastic optimization and the federated learning (FL) paradigm, we focus on the case where a small fraction of data samples are randomly sub-sampled in each round to participate in the learning process, which also enables privacy amplification. To obtain even stronger local privacy guarantees, we study this in the shuffle privacy model, where each client randomizes its response using a local differentially private (LDP) mechanism and the server only receives a random permutation (shuffle) of the clients' responses without theirassociation to each client. The principal result of this paper is a privacy-optimization performance trade-off for discrete randomization mechanisms in this sub-sampled shuffle privacy model. This is enabledthrough a new theoretical technique to analyze the Renyi Differential Privacy (RDP) of the sub-sampled shuffle model. We numerically demonstrate that, for important regimes, with composition our boundyields significant improvement in privacy guarantee over the state-of-the-art approximate Differential Privacy (DP) guarantee (with strong composition) for sub-sampled shuffled models. We also demonstrate numerically significant improvement in privacy-learning performance operating point using real data sets. Despite these advances, an open question is to bridge the gap between lower and upper privacy bounds in our RDP analysis.


Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB

Ren, Xingyu, Sun, Youran, Liang, Haoyu

arXiv.org Artificial Intelligence

We find that current text embedding models produce outputs with a consistent bias, i.e., each embedding vector $e$ can be decomposed as $\tilde{e} + μ$, where $μ$ is almost identical across all sentences. We propose a plug-and-play, training-free and lightweight solution called Renormalization. Through extensive experiments, we show that renormalization consistently and statistically significantly improves the performance of existing models on the Massive Multilingual Text Embedding Benchmark (MMTEB). In particular, across 38 models, renormalization improves performance by 9.7 $σ$ on retrieval tasks, 3.1 $σ$ on classification tasks, and 0.8 $σ$ on other types of tasks. Renormalization has two variants: directly subtracting $μ$ from $e$, or subtracting the projection of $e$ onto $μ$. We theoretically predict that the latter performs better, and our experiments confirm this prediction.



Improving Virtual Contrast Enhancement using Longitudinal Data

Fayolle, Pierre, Bône, Alexandre, Debs, Noëlie, Robert, Philippe, Bourdon, Pascal, Guillevin, Remy, Helbert, David

arXiv.org Artificial Intelligence

Gadolinium-based contrast agents (GBCAs) are widely used in magnetic resonance imaging (MRI) to enhance lesion detection and characterisation, particularly in the field of neuro-oncology. Nevertheless, concerns regarding gadolinium retention and accumulation in brain and body tissues, most notably for diseases that require close monitoring and frequent GBCA injection, have led to the need for strategies to reduce dosage. In this study, a deep learning framework is proposed for the virtual contrast enhancement of full-dose post-contrast T1-weighted MRI images from corresponding low-dose acquisitions. The contribution of the presented model is its utilisation of longitudinal information, which is achieved by incorporating a prior full-dose MRI examination from the same patient. A comparative evaluation against a non-longitudinal single session model demonstrated that the longitudinal approach significantly improves image quality across multiple reconstruction metrics. Furthermore, experiments with varying simulated contrast doses confirmed the robustness of the proposed method. These results emphasize the potential of integrating prior imaging history into deep learning-based virtual contrast enhancement pipelines to reduce GBCA usage without compromising diagnostic utility, thus paving the way for safer, more sustainable longitudinal monitoring in clinical MRI practice.


given the time-and space-bounded aspects of the rebuttal, hoping we clarified the main questions of the reviewers

Neural Information Processing Systems

We thank the four reviewers for their insightful comments and suggestions. I looked into the paper in ref[12] . . . ": In [12], the greedy algorithm is generic, with no assumptions about models ": Random search leads to a set of For Tab. 1, we ran the Wilcoxon signed-rank test (paired along settings, datasets and model types) and For Tab. 2 (with more costly experiments), we do not have enough runs to apply such We nonetheless report the standard errors in the paper, which seem to indicate significant improvements. ": Those numbers indicate the size of the ensemble; we will clarify this point. ": We thank R1 for the idea and ran our entire benchmark for ResNet-20: ": Hyper ensembles can indeed be viewed as a mixture They typically use Bayes nonparametric priors/posteriors and MCMC; we use mixtures and SGD. ": When used with replacement, the greedy algorithm from Caruana et al. [12, Sec.